OptionGAN: Learning Joint Reward-Policy Options using Generative Adversarial Inverse Reinforcement Learning

نویسندگان

Peter Henderson

Wei-Di Chang

Pierre-Luc Bacon

David Meger

Joelle Pineau

Doina Precup

چکیده

Reinforcement learning has shown promise in learning policies that can solve complex problems. However, manually specifying a good reward function can be difficult, especially for intricate tasks. Inverse reinforcement learning offers a useful paradigm to learn the underlying reward function directly from expert demonstrations. Yet in reality, the corpus of demonstrations may contain trajectories arising from a diverse set of underlying reward functions rather than a single one. Thus, in inverse reinforcement learning, it is useful to consider such a decomposition. The options framework in reinforcement learning is specifically designed to decompose policies in a similar light. We therefore extend the options framework and propose a method to simultaneously recover reward options in addition to policy options. We leverage adversarial methods to learn joint reward-policy options using only observed expert states. We show that this approach works well in both simple and complex continuous control tasks and shows significant performance increases in one-shot transfer learning.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Generative Adversarial Imitation Learning

Consider learning a policy from example expert behavior, without interaction with the expert or access to reinforcement signal. One approach is to recover the expert’s cost function with inverse reinforcement learning, then extract a policy from that cost function with reinforcement learning. This approach is indirect and can be slow. We propose a new general framework for directly extracting a...

متن کامل

Learning Robust Rewards with Adversarial Inverse Reinforcement Learning

Reinforcement learning provides a powerful and general framework for decision making and control, but its application in practice is often hindered by the need for extensive feature and reward engineering. Deep reinforcement learning methods can remove the need for explicit engineering of policy or value features, but still require a manually specified reward function. Inverse reinforcement lea...

متن کامل

What’s good for the goose is good for the GANder Comparing Generative Adversarial Networks for NLP

Generative Adversarial Nets (GANs), which use discriminators to help train a generative model, have been successful particularly in computer vision for generating images. However, there are many restrictions in its applications to natural language tasks–mainly, it is difficult to back-propagate through discrete-value random variables. Yet recent publications have applied GAN with promising resu...

متن کامل

SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient

As a new way of training generative models, Generative Adversarial Nets (GAN) that uses a discriminative model to guide the training of the generative model has enjoyed considerable success in generating real-valued data. However, it has limitations when the goal is for generating sequences of discrete tokens. A major reason lies in that the discrete outputs from the generative model make it di...

متن کامل

Multi-agent Generative Adversarial Imitation Learning

We propose a new framework for multi-agent imitation learning for general Markov games, where we build upon a generalized notion of inverse reinforcement learning. We introduce a practical multi-agent actor-critic algorithm with good empirical performance. Our method can be used to imitate complex behaviors in highdimensional environments with multiple cooperative or competitive agents. 1 MARKO...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

CoRR

دوره abs/1709.06683 شماره

صفحات -

تاریخ انتشار 2017

OptionGAN: Learning Joint Reward-Policy Options using Generative Adversarial Inverse Reinforcement Learning

نویسندگان

چکیده

منابع مشابه

Generative Adversarial Imitation Learning

Learning Robust Rewards with Adversarial Inverse Reinforcement Learning

What’s good for the goose is good for the GANder Comparing Generative Adversarial Networks for NLP

SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient

Multi-agent Generative Adversarial Imitation Learning

عنوان ژورنال:

اشتراک گذاری